home *** CD-ROM | disk | FTP | other *** search
- Network Working Group Richard Schantz (BBN-TENEX)
- Request for Comments: 672 Dec 1974
- NIC #31440
-
-
-
- A Multi-Site Data Collection Facility
-
-
-
-
- Preface:
-
- This RFC reproduces most of a working document
- prepared during the design and implementation of the
- protocols for the TIP-TENEX integrated system for
- handling TIP accounting. Bernie Cosell (BBN-TIP)
- and Bob Thomas (BBN-TENEX) have contributed to
- various aspects of this work. The system has been
- partially operational for about a month on selected
- hosts. We feel that the techniques described here
- have wide applicability beyond TIP accounting.
-
-
- Section I
-
- Protocols for a Multi-site Data Collection Facility
-
-
- Introduction
-
-
- The development of computer networks has provided the
- groundwork for distributed computation: one in which a job or task
- is comprised of components from various computer systems. In a
- single computer system, the unavailability or malfunction of any of
- the job components (e.g. program, file, device, etc.) usually
- necessitates job termination. With computer networks, it becomes
- feasible to duplicate certain job components which previously had no
- basis for duplication. (In a single system, it does not matter how
- many times a process that performs a certain function is duplicated;
- a system crash makes all unavailable). It is such resource
- duplication that enables us to utilize the network to achieve high
- reliability and load leveling. In order to realize the potential of
- resource duplication, it is necessary to have protocols which
- provide for the orderly use of these resources. In this document,
- we first discuss in general terms a problem of protocol definition
- for interacting with a multiply defined resource (server). The
- problem deals with providing a highly reliable data collection
- facility, by supporting it at many sites throughout the network. In
- the second section of this document, we describe in detail a
- particular implementation of the protocol which handles the problem
- of utilizing multiple data collector processes for collecting
- accounting data generated by the network TIPs. This example also
- illustrates the specialization of hosts to perform parts of a
- computation they are best equipped to handle. The large network
- hosts (TENEX systems) perform the accounting function for the small
- network access TiPs.
-
- The situation to be discussed is the following: a data
- generating process needs to use a data collection service which is
- duplicately provided by processes on a number of network machines.
- A request to a server involves sending the data to be collected.
-
-
- An Initial Approach
-
-
- The data generator could proceed by selecting a particular
- server and sending its request to that server. It might also take
- the attitude that if the message reaches the destination host (the
- communication subsystem will indicate this) the message will be
- properly processed to completion. Failure of the request Message
- would then lead to selecting another server, until the request
- succeeds or all servers have been tried.
-
-
-
- -2-
-
-
- Such a simple strategy is a poor one. It makes sense to
- require that the servicing process send a positive acknowledgement
- to the requesting process. If nothing else, the reply indicates
- that the server process itself is still functioning. Waiting for
- such a reply also implies that there is a strategy for selecting
- another server if the reply is not forthcoming. Herein lies a
- problem. If the expected reply is timed out, and then a new request
- is sent to another server, we run the risk of receiving the
- (delayed) original acknowledgement at a later time. This could
- result in having the data entered into the collection system twice
- (data duplication). If the request is re-transmitted to the same
- server only, we face the possibility of not being able to access a
- collector (data loss). In addition, for load leveling purposes, we
- may wish to send new requests to some (or all) servers. We can then
- use their reply (or lack of reply) as an indicator of load on that
- particular instance of the service. Doing this without data
- duplication requires more than a simple request and acknowledgement
- protocol*.
-
-
- Extension of the Protocol
-
-
- The general protocol developed to handle multiple collection
- servers involves having the data generator send the data request to
- some (or all) data collectors. Those willing to handle the request
- reply with an "I've got it" message. They then await further
- notification before finalizing the processing of the data. The data
- generator sends a "go ahead" message to one of the replying
- collectors, and a "discard" message to all other replying
- collectors. The "go ahead" message is the signal to process the
- data (i.e. collect permanently), while the "discard" message
- indicates that the data is being collected elsewhere and should not
- be retained.
-
- The question now arises as to whether or not the collector
- process should acknowledge receipt of the "go ahead" message with a
- reply of its own, and then should the generator process acknowledge
- this acknowledgement, etc. We would like to send as few messages as
- possible to achieve reliable communication. Therefore, when a state
- --------------------
-
- * If the servers are independent of each other to the extent that if
- two or more servers all act on the same request, the end result is
- the same as having a single server act on the request, then a simple
- request/acknowledgement protocol is adequate. Such may be the case,
- for example, if we subject the totality of collected data (i.e. all
- data collected by all collectors for a certain period) to a
- duplicate detection scan. If we could store enough context in each
- entry to be able to determine duplicates, then having two or more
- servers act on the data would be functionally equivalent to
- processing by a single server.
-
-
- -3-
-
-
- is reached for which further acknowledgements lead to a previously
- visited state, or when the cost of further acknowledgements outweigh
- the increase in reliability they bring, further acknowledgements
- become unnecessary.
-
- The initial question was should the collector process
- acknowledge the "go ahead" message? Assume for the moment that it
- should not send such an acknowledgement. The data generator could
- verify, through the communication subsystem, the transmission of the
- "go ahead" message to the host of the collector. If this message
- did not arrive correctly, the generator has the option of
- re-transmitting it or sending a "go ahead" to another collector
- which has acknowledged receipt of the data. Either strategy
- involves no risk of duplication. If the "go ahead" message arrives
- correctly, and a collector acknowledgement to the "go ahead" message
- is not required, then we incur a vulnerability to (collector host)
- system crash from the time the "go ahead" message is accepted by the
- host until the time the data is totally processed. Call the data
- processing time P. Once the data generator has selected a
- particular collector (on the basis of receiving its "I've got it"
- message), we also incur a vulnerability to malfunction of this
- collector process. The vulnerable period is from the time the
- collector sends its "i've got it" message until the time the data is
- processed. This amounts to two network transit times (2N) plus IMP
- and host overhead for message delivery (0) plus data processing time
- (P). [Total time=2N+P+O]. A malfunction (crash) in this period can
- cause the loss of data. There is no potential for duplication.
-
- Now, assume that the data collector process must acknowledge
- the "go ahead" message. The question then arises as to when such an
- acknowledgement should be sent. The reasonable choices are either
- immediately before final processing of the data (i.c. before the
- data is permanently recorded) or immediately after final processing.
- It can be argued that unless another acknowledgement is required (by
- the generator to the collector) to this acknowledgement BEFORE the
- actual data update, then the best time for the collector to
- acknowledge the "go ahead" is after final processing. This is so
- because receiving the acknowledgement conveys more information if it
- is sent after processing, while not receiving it (timeout), in
- either case, leaves us in an unknown state with respect to the data
- update. Depending on the relative speeds of various network and
- system components, the data may or may not be permanently entered.
- Therefore if we interpret the timeout as a signal to have the data
- processed at another site, we run the risk of duplication of data.
- To avoid data duplication, the timeout strategy must only involve
- re-sending the "go ahead" message to the same collector. This will
- only help if the lack of reply is due to a lost network message.
- Our vulnerability intervals to system and process malfunction remain
- as before.
-
- It is our conjecture (to be analyzed further) that any further
- acknowledgements to these acknowledgements will have virtually no
- effect on reducing the period of vulnerability outlined above. As
- such, the protocol with the fewest messages required is superior.
-
-
- -4-
-
- Data Dependent Aspects of the Protocol
-
-
- As discussed above, a main issue is which process should be the
- last to respond (send an acknowledgement). If the data generator
- sends the last message (i.e. "go ahead"), we can only check on its
- correct arrival at the destination host. We must "take on faith"
- the ability of the collector to correctly complete the transaction.
- This strategy is geared toward avoiding data duplication. If on the
- other hand, the protocol specifies that the collector is to send the
- last message, with the timeout of such a message causing the data
- generator to use another collector, then the protocol is geared
- toward the best efforts of recording the data somewhere, at the
- expense of possible duplication.
-
- Thus, the nature of the problem will dictate which of the
- protocols is appropriate for a given situation. The next section
- deals in the specifics of an implement;tion of a data collection
- protocol to handle the problem of collecting TIP accounting data by
- using the TENEX systems for running the collection server processes.
- It is shown how the general protocol is optimized for the accounting
- data collection.
-
-
-
-
- Section II
-
- Protocol for TIP-TENEX Accounting Server Information Exchange
-
-
- Overview of the Facility
-
-
- When a user initially requests service from a TIP, the TIP will
- perform a broadcast ICP to find an available RSEXEC which maintains
- an authentication data base. The user must then complete s login
- sequence in order to authenticate himself. If he is successful the
- RSEXEC will transmit his unique ID code to the TIP. Failure will
- cause the RSEXEC to close the connection and the TIP to hang up on
- the user. After the user is authenticated, the TIP will accumulate
- accounting data for the user session. The data includes a count of
- messages sent on behalf of the user, and the connect time for the
- user. From time to time the TIP will transmit intermediate
- accounting data to Accounting Server (ACTSER) processes scattered
- throughout the network. These accounting servers will maintain
- files containing intermediate raw accounting data. The raw
- accounting data will periodically be collected and sorted to produce
- an accounting data base. Providing a number of accounting servers
- reduces the possibility of being unable to find a repository for the
- intermediate data, which otherwise would be lost due to buffering
- limitations in the TiPs. The multitude of accounting servers can
- also serve to reduce the load on the individual hosts providing this
- facility.
-
-
- -5-
-
- The rest of this document details the protocol that has been
- developed to ensure delivery of TIP accounting data to one of the
- available accounting servers for storage in the intermediate
- accounting files.
-
-
- Adapting the Protocol
-
-
- The TIP to Accounting Server data exchange uses a protocol that
- allows the TIP to select for data transmission one, some, or all
- server hosts either sequentially or in parallel, yet insures that
- the data that becomes part of the accounting file does not contain
- duplicate information. The protocol also minimizes the amount of
- data buffering that must be done by the limited capacity TiPs. The
- protocol is applicable to a wide class of data collection problems
- which use a number of data generators and collectors. The following
- describes how the protocol works for TIP accounting.
-
- Each TIP is responsible for maintaining in its memory the cells
- indicating the connect time and the number of messages sent for each
- of its current users. These cells are incremented by the TIP for
- every quantum of connect time and message sent, as the case may be.
- This is the data generation phase. Periodically, the TIP will scan
- all its active counters, and along with each user ID code, pack the
- accumulated data into one network message (i.e. less than 8K bits).
- The TIP then transmits this data to a set of Accounting Server
- processes residing throughout the network. The data transfer is
- over a specially designated host-host link. The accounting servers
- utilize the raw network message facility of TENEX 1.32 in order to
- directly access that link. When an ACTSER receives a data message
- from a TIP, it buffers the data and replies by returning the entire
- message to the originating TIP. The TIP responds with a positive
- acknowledgement ("go ahead") to the first ACTSER which returns the
- data, and responds with a negative acknowledgement ("discard") to
- all subsequent ACTSER data return messages for this series of
- transfers. If the TIP does not receive a reply from any ACTSER, it
- accumulates new data (i.e. the TIP has all the while been
- incrementing its local counters to reflect the increased connect
- time and message count; the current values will comprise new data
- transfers) and sends the new data to the Accounting Server
- processes. When an ACTSER receives a positive acknowledgement from
- a TIP (i.e. "go ahead"), it appends the appropriate parts of the
- buffered data to the locally maintained accounting information file.
- On receiving a negative acknowledgement from the TIP (i.e.
- "discard"), the ACTSER discards the data buffered for this TIP. In
- -addition, when the TIP responds with a "go ahead" to the first
- ACTSER which has accepted the data (acknowledged by returning the
- data along with the "I've got it"), the TIP decrements the connect
- time and message counters for each user by the amount indicated in
- the data returned by the ACTSER. This data will already be
- accounted for in the intermediate accounting files.
-
- As an aid in determining which ACTSER replies are to current
- requests, and which are tardy replies to old requests, the TIP
-
- -6-
- maintains a sequence number indicator, and appends this number to
- each data message sent to an ACTSER. On receiving a reply from an
- ACTSER, the TIP merely checks the returned sequence number to see if
- this is the first reply to the current set of TIP requests. If the
- returned sequence number is the same as the current sequence number,
- then this is the first reply; a positive acknowledgement is sent
- off, the counters are decremented by the returned data, and the
- sequence number is incremented. If the returned sequence number is
- not the same as the current one (i.e. not the one we are now
- seeking a reply for) then a negative acknowledgement is sent to the
- replying ACTSER. After a positive acknowledgement to an ACTSER (and
- the implied incrementing of the sequence number), the TIP can wait
- for more information to accumulate, and then start transmitting
- again using the new sequence number.
-
-
- Further Clarification of the Protocol
-
-
- There are a number of points concerning the protocol that
- should be noted.
-
- 1. The data generator (TIP) can send different (i.e. updated
- versions) data to different data collectors (accounting servers) as
- part of the same logical transmission sequence. This is possible
- because the TIP does not account for the data sent until it receives
- the acknowledgement of the data echo. This strategy relieves the
- TIP of any buffering in conjunction with re-transmission of data
- which hasn't been acknowledged.
-
- 2. A new data request to an accounting server from a TIP will
- also serve as a negative acknowledgement concerning any data already
- buffered by the ACTSER for that TIP, but not yet acknowledged. The
- old data will be discarded, and the new data will be buffered and
- echoed as an acknowledgement. This allows the TIP the option of not
- sending a negative acknowledgement when it is not convenient to do
- so, without having to remember that it must be sent at a later time.
- There is one exception to this convention. If the new data message
- has the same sequence number as the old buffered message, then the
- new data must be discarded, and the old data kept and re-echoed.
- This is to prevent a slow acknowledgement to the old data from being
- accepted by the TIP, after the TIP has already sent the new data to
- the slow host. This caveat can be avoided if the TIP does not
- resend to a non-responding server within the time period that a
- message could possibly be stuck in the network, but could still be
- delivered. Ignoring this situation may result in some accounting
- data being counted twice. Because of the rule to keep old data when
- confronted with matching sequence numbers, on restarting after a
- crash, the TIP should send a "discard" message to all servers in
- order to clear any data which has been buffered for it prior to the
- crash. An alternative to this would be for the TIP to initialize
- its sequence number from a varying source such as time of day.
-
- 3. The accounting server similarly need not acknowledge receipt
- of data (by echoing) if it finds itself otherwise occupied. This
- will mean that the ACTSER is not buffering the data, and hence is
- not a candidate for entering the data into the file. However, the
-
- -7-
- TIP may try this ACTSER at a later time (even with the same data),
- with no ill effects.
-
- 4. Because of 2 and 3 above, the protocol is robust with respect
- to lost or garbled transmissions of TIP data requests and accounting
- server echo replies. That is, in the event of loss of such a
- message, a re-transmission will occur as the normal procedure.
-
- 5. There is no synchronization problem with respect to the
- sequence number used for duplicate detection, since this number is
- maintained only at the TIP site. The accounting server merely
- echoes the sequence number it has received as part of the data.
-
- 6. There are, however, some constraints on the size of the
- sequence number field. It must be large enough so that ALL traces
- of the previous use of a given sequence number are totally reMoved
- from the network before the number is re-used by the TIP. The
- sequence number is modulo the size of the largest number represented
- by the number of bits allocated, and is cyclic. Problems generally
- arise when a host proceeds from a service interruption while it was
- holding on to a reply. If during the service interruption, we have
- cycled through our sequence numbers exactly N times (where N is any
- integer), this VERY tardy reply could be mistaken for a reply to the
- new data, which has the same sequence number (i.e. N revolutions of
- sequence numbers later). By utilizing a sufficiently large sequence
- number field (16 bits), and by allowing sufficient time between
- instances of sending new data, we can effectively reduce the
- probability of such an error to zero.
-
- 7. Since the data involved in this problem is the source of
- accounting information, care must be taken to avoid duplicate
- entries. This must be done at the expense of potentially losing
- data in certain instances. Other than the obvious TIP malfunction,
- there are two known ways of losing data. One is the situation where
- no accounting server responds to a TIP for an extended period of
- time causing the TIP counters to overflow (highly unlikely if there
- are sufficient Accounting Servers). In this case, the TIP can hold
- the counters at their maximum value until a server comes up, thereby
- keeping the lost accounting data at its minimum. The other
- situation results from adapting the protocol to our insistence on no
- duplicate data in the incremental files. We are vulnerable to data
- loss with no recourse from the time the server receives the "go
- ahead" to update the file with the buffered data (i.e. positive
- acknowledgement) until the time the update is completed and the file
- is closed. An accounting server crash during this period will cause
- that accounting data to be lost. In our initial implementation, we
- have slightly extended this period of vulnerability in order to save
- the TIP from having to buffer the acknowledged data for a short
- period of time. By updating TIP counters from the returned data in
- parallel with sending the "go ahead" acknowledgement, we relieve the
- TIP of the burden of buffering this data until the Request for Next
- Message (RFNM) from the accounting server IMP is received. This
- adds slightly to our period of vulnerability to malfunction, moving
- the beginning of the period from the point when the ACTSER host
- receives the "go ahead", back to the point when the TIP sends off
-
- -8-
-
- the "go ahead" (i.e. a period of one network transit time plus some
- IMP processing time). However, loss of data in this period is
- detectable through the Host Dead or Incomplete Transmission return
- in place of the RFNM. We intend to record such occurrences with the
-
- Network Control Center. If this data loss becomes intolerable, the
- TIP program will be modified to await the RFNM for the positive
- acknowledgement before updating its counters. In such a case, if
- the RFNM does not come, the TIP can discard the buffered data and
- re-transmit new data to other servers.
-
- 8. There is adequate protection against the entry of forged data
- into the intermediate accounting files. This is primarily due to
- the system enforced limited access to Host-Imp messages and
- Host-Host links. In addition, messages received on such designated
- limited access links can be easily verified as coming from a TIP.
- The IMP subnet appends the signature (address) of the sending host
- to all of its messages, so there can be no forging. The Accounting
- Server is in a position to check if the source of the message is in
- fact a TIP data generator.
-
-
- Current Parameters of the Protocol
-
-
- In the initial implementation, the TIP sends its accumulated
- accounting data about once every half hour. If it gets no positive
- acknowledgement, it tries to send with greater frequency (about
- every 5 minutes) until it finally succeeds. It can then return to
- the normal waiting period. (A TIP user logout introduces an
- exception to this behavior. In order to re-use the TIP port and its
- associated counters as soon as possible, a user terminating his TIP
- session causes the accounting data to be sent immediately).
- initially, our implementation calls for each TIP to remember a
- "favored" accounting server. At the wait period expiration, the TIP
- will try to deposit the data at its "favored" site. If successful
- within a short timeout period, this site remains the favored site,
- and the wait interval is reset. If unsuccessful within the short
- timeout, the data can be sent to all servers*. The one replying
- first will update its file with the data and also become the
- "favored" server for this TIP. With these parameters, a host would
- have to undergo a proceedable service interruption of more than a
- year in order for the potential sequence number problem outlined in
- (6) above to occur.
-
-
- Concluding Remarks
-
-
- When the implementation is complete, we will have a general
- data accumulation and collection system which can be used to gather
- a wide variety of information. The protocol as outlined is geared
- to gathering data which is either independent of the previously
- accumulated data items (e.g. recording names), or data which
- adheres to a commutative relationship (e.g. counting). This is a
-
- -9-
-
- consequence of the policy of retransmission of different versions of
- the data to different potential collectors (to relieve TIP buffering
- problems).
-
- In the specified version of the protocol, care was taken to
- avoid duplicate data entries, at the cost of possibly losing some
- data through collector malfunction. Data collection problems which
- require avoiding such loss (at the cost of possible duplication of
- some data items) can easily be accommodated with a slight adjustment
- to the protocol. Collected data which does not adhere to the
- commutative relationship indicated above, can also be handled by
- utilizing more buffer space at the data generator sites.
-
-
- The sequence number can be incremented for this new set of data
- messages, and the new data can also be sent to the slow host. In
- this way we won't be giving the tardy response from the old favored
- host unfair advantage in determining which server can respond most
- quickly. If there is no reply to this series of messages, the TIP
- can continue to resend the new data. However, the sequence number
- should not be incremented, since no reply was received, and since
- indiscriminate incrementing of the sequence number increases the
- chance of recycling during the lifetime of a message.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- -10-
-